Lite: fix compaction OOM by setting DuckDB temp_directory (#933) by erikdarlingdata · Pull Request #935 · erikdarlingdata/PerformanceMonitor

erikdarlingdata · 2026-05-05T19:45:03Z

Summary

The in-memory DuckDB connections used for parquet compaction had a 4 GB memory_limit pragma but no temp_directory, so the cap acted as a hard wall — DuckDB had nowhere to spill and OOM'd the moment it was hit.
Set temp_directory to <archive>/duckdb_tmp/ on both compaction connections (small-group and incremental pair-merge). Co-locating with the archive keeps spill writes on the same volume as the parquet files.

Verification

Tested end-to-end against 4 monitored SQL Servers under HammerDB load:

First 512 MB reset (single-file groups) — 26 groups compacted, no errors.
Second 512 MB reset (multi-file groups, the path that OOM'd in [BUG] Memory usage on client #933) — 21 groups merged with 2 source files each, completed in ~3.5s, duckdb_tmp/ cleaned up on connection close, no OOM.

Closes #933.

Test plan

Builds clean (0 errors / 0 warnings on incremental build)
First reset (single-file compaction) succeeds with new code path
Second reset (multi-file pair-merge) succeeds — the actual OOM-prone path from [BUG] Memory usage on client #933
duckdb_tmp/ directory created on first archive cycle and cleaned by DuckDB on connection close
No regression in collector health during archive cycles
Reporter (000al000) confirms fix on their 4-server environment

🤖 Generated with Claude Code

The in-memory DuckDB connections used for parquet compaction had a 4 GB memory_limit pragma but no temp_directory, so the cap acted as a hard wall — DuckDB had nowhere to spill and OOM'd the moment it was hit. Co-locate the spill dir with the archive folder so the writes land on the same volume as the parquet files. Verified end-to-end: 4-server HammerDB load, second 512 MB reset triggered ArchiveAllAndResetAsync, all 21 groups went through the multi-file pair-merge path with two sources each, completed in ~3.5s with no OOM. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…don't OOM erikdarlingdata#935 added temp_directory so DuckDB could spill, but on wider workloads the working set still blew past the 4 GB cap before spill caught up (reporter saw OOM at 3.7 GiB compacting 15 query_snapshots files). Three knobs combined to feed that: - memory_limit = 4 GB was too high — DuckDB held off spilling until late - threads defaulted to N cores, multiplying per-thread row-group buffers - ROW_GROUP_SIZE 122880 buffered up to 122k wide-VARCHAR rows per group Drop memory_limit to 1 GB, cap threads to 2, and shrink ROW_GROUP_SIZE to 8192. On 1.7 M rows of real query_stats data this drops peak working set from 1236 MB → 166 MB (87% reduction) at a 31% wall-time cost. Memory now plateaus instead of growing with row count, which is the load-bearing change for issue erikdarlingdata#933. Adds tools/CompactionRepro — a standalone reproducer that splits a real monthly parquet file into N per-cycle-shaped chunks and runs the same pair-merge logic with the tuning knobs exposed on the command line. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

#942 lowered the cap to 1 GB on the theory that a tight memory_limit plus temp_directory would force DuckDB to spill earlier and keep peak working set down. That validation ran against query_stats (narrow, ~1.7M rows) and showed peak 1236 MB → 166 MB. The reporter's actual failure is on query_snapshots, which carries query_text + query_plan + live_query_plan per row. With the 1 GB cap, the nightly logs show OOM at "906/953 MiB used" before any merge progress. The standalone reproducer (tools/CompactionRepro) confirms the cause: parquet COPY in DuckDB v1.5.2 makes allocations that bypass the buffer manager and can't be spilled. The cap acts as a hard ceiling for those, not a spill trigger. Spill on disk = 0 MB across every configuration we tested (memory_limit 1/2/4 GB, accumulator vs tournament merge, threads 1 vs 2, :memory: vs file-backed DB). The same failure reproduces in standalone DuckDB CLI v1.5.2, so it's an engine issue — see upstream issues duckdb#16482 and discussion#10084. DuckDB's own OOM guide explicitly warns about this case and recommends memory_limit at 50-60% of system RAM, not a tight cap. 4 GB sits well inside that range for typical workstation/server hosts and leaves real headroom on top of the un-spillable allocations. Reporter's actual file sizes (15-25 chunks of 2-6 MB plus a 35-45 MB monthly file per group) are well below the level where 4 GB has any trouble. The reproducer confirms 4 GB succeeds on a synthetic query_snapshots-shaped dataset of ~1.5 GB with peak working set of ~400 MB; the reporter's data is ~143 MB at worst. Also updates the stale comment about spilling — temp_directory was set per #935 but the buffer-manager-bypassing allocations don't use it. The comment now describes what actually happens. The tools/CompactionRepro changes add --strategy {accumulator|tournament}, --db-mode {memory|file}, --merge-files, --synthetic data generation, and --cycles for leak testing. These are kept so a future regression in this area can be reproduced and diagnosed quickly. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

erikdarlingdata merged commit e0cd22b into dev May 5, 2026
2 checks passed

This was referenced May 5, 2026

[BUG] Memory usage on client #933

Open

Fix #933 — bound compaction memory so wide-row tables don't OOM #942

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lite: fix compaction OOM by setting DuckDB temp_directory (#933)#935

Lite: fix compaction OOM by setting DuckDB temp_directory (#933)#935
erikdarlingdata merged 1 commit into
devfrom
feature/933-compaction-temp-directory

erikdarlingdata commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erikdarlingdata commented May 5, 2026

Summary

Verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant